Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NHWC] asm igemm xdlops fwd/bwd fp32/fp16 kernel #958

Merged
merged 20 commits into from
Jun 23, 2021

Conversation

carlushuang
Copy link
Contributor

@carlushuang carlushuang commented May 28, 2021

This is PR of asm igemm for NHWC layout. Currently this is for fwd/bwd, fp32/fp16.

  • NHWC fwd fp32/fp16 kernel and solver ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC
  • NHWC bwd fp32/fp16 kernel and solver ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC
  • make sure working properly in both Heuristic mode and tuning mode
  • add ctest for testing

tested these cases: perf_data
Most of the files in this PR is generated by iGEMMgen, with a single .s file for a single kernel. Apart from this, only several cpp files have changes. Below is list of non-asm files that have changes or added

src/CMakeLists.txt
src/conv/invokers/impl_gemm_dynamic.cpp
src/include/miopen/conv/asm_implicit_gemm.hpp
src/include/miopen/conv/invokers/impl_gemm_dynamic.hpp
src/include/miopen/solver.hpp
src/mlo_dir_conv.cpp
src/solver.cpp
src/solver/conv_asm_implicit_gemm_gtc_bwd_nhwc.cpp
src/solver/conv_asm_implicit_gemm_gtc_fwd_nhwc.cpp
src/solver/conv_asm_implicit_gemm_gtc_perf_config.cpp
test/CMakeLists.txt

@carlushuang carlushuang marked this pull request as draft May 28, 2021 14:48
@carlushuang carlushuang changed the title [NHWC] fwd/bwd fp32/fp16 kernel [NHWC] asm igemm xdlops fwd/bwd fp32/fp16 kernel May 28, 2021
@ghost
Copy link

ghost commented May 29, 2021

Congratulations 🎉. DeepCode analyzed your code in 2.149 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

👉 The DeepCode service and API will be deprecated in August, 2021. Here is the information how to migrate. Thank you for using DeepCode 🙏 ❤️ !

If you are using our plugins, you might be interested in their successors: Snyk's JetBrains plugin and Snyk's VS Code plugin.

@carlushuang carlushuang marked this pull request as ready for review May 31, 2021 07:37
@atamazov
Copy link
Contributor

@carlushuang Could you please add urgency and value labels. Thanks.

@codecov

This comment has been minimized.

@carlushuang
Copy link
Contributor Author

@atamazov Hi please help review this PR, now this PR have passed CI for once and framework team need this feature to do NHWC e2e test

@jerryyin
Copy link
Member

jerryyin commented Jun 4, 2021

I have little idea of how to interpret/validate assembly kernels. Abstain from reviewing this PR, but thanks for your contribution.

asroy
asroy previously approved these changes Jun 15, 2021
Copy link
Contributor

@asroy asroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've only reviewed the host side code.

src/conv/invokers/impl_gemm_dynamic.cpp Outdated Show resolved Hide resolved
src/conv/invokers/impl_gemm_dynamic.cpp Outdated Show resolved Hide resolved
@carlushuang carlushuang requested a review from atamazov June 16, 2021 10:31
src/conv/invokers/impl_gemm_dynamic.cpp Outdated Show resolved Hide resolved
src/conv/invokers/impl_gemm_dynamic.cpp Outdated Show resolved Hide resolved
@junliume
Copy link
Collaborator

[2021-06-17T09:09:23.503Z] Backward weights convolution: ConvAsmBwdWrW1x1
[2021-06-17T09:09:23.503Z] Input tensor: 64, 128, 28, 28
[2021-06-17T09:09:23.503Z] Weights tensor: 512, 128, 1, 1
[2021-06-17T09:09:23.503Z] Output tensor: 64, 512, 28, 28
[2021-06-17T09:09:23.503Z] Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1},
[2021-06-17T09:09:23.503Z] ../bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --input 64, 160, 73, 73 --weights 64, 160, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW
[2021-06-17T09:09:23.503Z] FAILED: 0.000103983
[2021-06-17T09:09:23.503Z] Max diff: 738
[2021-06-17T09:09:23.503Z] Mismatch at 940430: -146 != -427

@carlushuang
Copy link
Contributor Author

CI all passed
@junliume the fail about ConvAsmBwdWrW1x1 is not introduced by this PR. (This is direct algorithm), and after I re-start that test, all passed

junliume
junliume previously approved these changes Jun 18, 2021
@carlushuang carlushuang requested a review from atamazov June 19, 2021 03:13
@atamazov
Copy link
Contributor

@junliume Thanks for the info from CI! Please note that correct quotation from the log is:

[2021-06-17T09:09:23.503Z] ../bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --input 64, 160, 73, 73 --weights 64, 160, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
[2021-06-17T09:09:23.503Z] FAILED: 0.000103983
[2021-06-17T09:09:23.503Z] Max diff: 738
[2021-06-17T09:09:23.503Z] Mismatch at 940430: -146 != -427
[2021-06-17T09:09:23.503Z] Forward convolution: ConvOclDirectFwd1x1
[2021-06-17T09:09:23.503Z] Input tensor: 64, 160, 73, 73
[2021-06-17T09:09:23.503Z] Weights tensor: 64, 160, 1, 1
[2021-06-17T09:09:23.503Z] Output tensor: 64, 64, 73, 73
[2021-06-17T09:09:23.503Z] Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1}, 

and faulty Solver is ConvOclDirectFwd1x1.

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocked by #991

test/CMakeLists.txt Outdated Show resolved Hide resolved
atamazov
atamazov previously approved these changes Jun 23, 2021
Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None objections!

@atamazov
Copy link
Contributor

@carlushuang ...but I still need an answer for this: #958 (comment)

# RESOLVED Conflicts:
#	src/solver.cpp
@atamazov
Copy link
Contributor

CI run 30 has passed all the tests.

@atamazov atamazov merged commit 6c0b289 into develop Jun 23, 2021
@atamazov atamazov deleted the asm_igemm_nhwc_fwd_bwd branch June 23, 2021 23:40
atamazov pushed a commit that referenced this pull request Jul 22, 2021
* add NHWC fwd/bwd fp32/fp16 kernel
* fix several bug in Herustic and Tuning
* add missing kernel and update config list
* remove useless config lists
* fix fwd fp32 not valid config
* fix isValid
* fix bwd fp16 not proper IsApplicable
* add ctest for nhwc asm kernels
* reorg NextLinear
* parse in opArgs as mutable, reserve place for in/wei/out pointer outside lambda
* Assign OpKernelArg() to opArgs vector, set proper ctest flag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants